Goal

Consider the time to blindness of both eyes in diabetic patients, where one eye is randomly assigned to laser treatment and the other eye serves as a control. Let \(T_1\) and \(T_2\) denote the true failure times (in months) of the treated and untreated eye in the diabeteic retinopathy dataset. Our goal here is to estimate the conditional survival of the treated eye (\(T_1\)) conditional on either failure or survival of the untreated eye (\(T_2\)). Specifically, we estimate both \(P(T_1>60 \mid T_2\leq 36, Z)\) and \(P(T_1>60 \mid T_2> 36, Z)\) using the bivariate pseudo-observations approach. That is, we are interested in the conditional probability that the treated eye survives more than five years, given that the untreated eye has either failed or survived during the first three years, given the covariates.

For this purpose, we use our bivariate pseudo-observations approach based on the Dabrowska estimator, and the logit link function, to estimate the covariate-adjusted bivariate survival probabilities at the three time points \(\{(60,36), (60,0), (0,36)\}\). Consequently, we obtain estimates of the two conditional probabilities \(S_{T_1| T_2\leq 36}(60\mid Z)\) and \(S_{T_1| T_2>36}(60\mid Z)\) using the relationships: \[\begin{align*} S_{T_1| T_2\leq 36}(60\mid Z)&=P(T_1>60 \mid T_2\leq 36, Z)=1-\frac{S(60,36 \mid Z)-S(60,0 \mid Z)-S(0,36 \mid Z)+1}{1-S(0,36 \mid Z)}, \\ S_{T_1| T_2>36}(60\mid Z)&=P(T_1>60 \mid T_2> 36, Z)=\frac{S(60,36 \mid Z)}{S(0,36 \mid Z)}. \end{align*}\]

Data analysis

Load relevant R libraries and the bivaraite pseudo-observations functions

library(tidyr)
library(plotly)
source("pseudo_observations_functions.R")

Load the diabetic retinopaty dataset

Read the dataset (in wide format).

obs <- read.csv("retinopathy_data.csv")
head(obs)
##   id  obs1  obs2 delta1 delta2 age mean_risk     type
## 1  5 46.23 46.23      0      0  28       9.0    adult
## 2 14 42.50 31.30      0      1  12       7.0 juvenile
## 3 16 42.27 42.27      0      0   9      11.0 juvenile
## 4 25 20.60 20.60      0      0   9      11.0 juvenile
## 5 29 38.77  0.30      0      1  13       9.5 juvenile
## 6 46 65.23 54.27      0      1  12       9.0 juvenile

Note that obs1 and obs2 corresponds to the observed times (in months) of the treated and untreated eye, respectively.

Define the bivariate time points and calculate the pseudo-observations at these time points

t0 <- data.frame(t1=c(60,60,0),t2=c(36,0,36))
t0_merged <- paste0(t0[,1],",",t0[,2], sep = "")
obs <- PO_func_dabrowska_mult_new(obs,t0)
head(obs)
##   id  obs1  obs2 delta1 delta2 age mean_risk     type         PO_t1     PO_t2
## 1  1 46.23 46.23      0      0  28       9.0    adult  1.001560e+00 1.0072337
## 2  2 42.50 31.30      0      1  12       7.0 juvenile -8.221699e-02 1.0072337
## 3  3 42.27 42.27      0      0   9      11.0 juvenile  9.780348e-01 0.9923951
## 4  4 20.60 20.60      0      0   9      11.0 juvenile  7.309823e-01 0.8642561
## 5  5 38.77  0.30      0      1  13       9.5 juvenile  1.705303e-13 0.9672061
## 6  6 65.23 54.27      0      1  12       9.0 juvenile  1.075136e+00 1.0541096
##           PO_t3
## 1  1.024552e+00
## 2 -9.364188e-02
## 3  1.024552e+00
## 4  8.367084e-01
## 5  1.421085e-14
## 6  1.024552e+00

Convert the data into a long format where each patient now has K=3 rows

n <- nrow(obs)
obs_long <- pivot_longer(obs,cols = starts_with("PO_t"))
obs_long$times <- rep(t0_merged,n)
head(obs_long,n = 9)
## # A tibble: 9 × 11
##      id  obs1  obs2 delta1 delta2   age mean_risk type     name    value times
##   <int> <dbl> <dbl>  <int>  <int> <int>     <dbl> <chr>    <chr>   <dbl> <chr>
## 1     1  46.2  46.2      0      0    28         9 adult    PO_t1  1.00   60,36
## 2     1  46.2  46.2      0      0    28         9 adult    PO_t2  1.01   60,0 
## 3     1  46.2  46.2      0      0    28         9 adult    PO_t3  1.02   0,36 
## 4     2  42.5  31.3      0      1    12         7 juvenile PO_t1 -0.0822 60,36
## 5     2  42.5  31.3      0      1    12         7 juvenile PO_t2  1.01   60,0 
## 6     2  42.5  31.3      0      1    12         7 juvenile PO_t3 -0.0936 0,36 
## 7     3  42.3  42.3      0      0     9        11 juvenile PO_t1  0.978  60,36
## 8     3  42.3  42.3      0      0     9        11 juvenile PO_t2  0.992  60,0 
## 9     3  42.3  42.3      0      0     9        11 juvenile PO_t3  1.02   0,36

Fit the bivariate proportional odds model (i.e. logistic regression model)

set.seed(123)
fit <- geese(value~factor(times, levels=t0_merged)+age+mean_risk+type,
                id=id, data=obs_long,scale.fix=TRUE,family=gaussian,
                jack=TRUE, mean.link="logit",corstr="independence",)
betas <- fit$beta
summary(fit)
## 
## Call:
## geese(formula = value ~ factor(times, levels = t0_merged) + age + 
##     mean_risk + type, id = id, data = obs_long, family = gaussian, 
##     mean.link = "logit", scale.fix = TRUE, corstr = "independence", 
##     jack = TRUE)
## 
## Mean Model:
##  Mean Link:                 logit 
##  Variance to Mean Relation: gaussian 
## 
##  Coefficients:
##                                           estimate    san.se     ajs.se
## (Intercept)                            1.657976764 1.2772363 1.29241291
## factor(times, levels = t0_merged)60,0  1.133480665 0.1467074 0.14483368
## factor(times, levels = t0_merged)0,36  0.508826212 0.1071158 0.10573533
## age                                   -0.006300588 0.0159417 0.01617585
## mean_risk                             -0.177551413 0.1112872 0.11270522
## typejuvenile                          -0.126652196 0.4917446 0.49597162
##                                              wald            p
## (Intercept)                            1.68505619 1.942540e-01
## factor(times, levels = t0_merged)60,0 59.69310389 1.110223e-14
## factor(times, levels = t0_merged)0,36 22.56482185 2.031709e-06
## age                                    0.15620437 6.926754e-01
## mean_risk                              2.54541236 1.106150e-01
## typejuvenile                           0.06633553 7.967489e-01
## 
## Scale is fixed.
## 
## Correlation Model:
##  Correlation Structure:     independence 
## 
## Returned Error Value:    0 
## Number of clusters:   197   Maximum cluster size: 3

Calculate the predicted values at each of the three bivariate time points

des_mat <- model.matrix(value~factor(times, levels=t0_merged)+age+mean_risk+type, data = obs_long)
pred_values <- data.frame('Estimated survival'=exp(des_mat%*%betas)/(1+exp(des_mat%*%betas)),type=obs_long$type, Z=obs_long$age, X=obs_long$mean_risk, time=obs_long$times)
pred_1 <- pred_values[pred_values$time=="60,36",]
pred_2 <- pred_values[pred_values$time=="60,0",]
pred_3 <- pred_values[pred_values$time=="0,36",]

Calculate \(P(T_1>60 \mid T_2\leq 36, Z)\)

cond_prob1 <- pred_1
cond_prob1$Estimated.survival <- 1-(pred_1$Estimated.survival-pred_2$Estimated.survival-pred_3$Estimated.survival+1)/(1-pred_3$Estimated.survival)

Plot \(P(T_1>60 \mid T_2\leq 36, Z)\)

cond_prob1 <- cond_prob1[order(cond_prob1$Z,cond_prob1$X),]
p <- plot_ly(x=cond_prob1$Z, y=cond_prob1$X, z=cond_prob1$Estimated.survival, type="scatter3d", mode="markers",color = cond_prob1$Estimated.survival)
p <- layout(p, scene = list(xaxis = list(title = "Age at diagnosis", range = c(0,60)), yaxis = list(title = "Risk score", range = c(6,12)), zaxis = list(title = "P(T1>5|T2<3)",range = c(0,1))))
p

As can be seen in this interactive figure, the covariate-adjusted conditional probability \(P(T_1>60 \mid T_2\leq 36, Z)\) ranges between 0.5 and 0.77, meaning that even if the untreated eye failed during the first three years, the treated eye still has a relatively high chance to survive more than five years. Additionally, this conditional survival probability of the treated eye is highest as the risk score is lowest, and it decreases as the risk score increases. Age at diagnosis of diabetes seems to also have some effect on this conditional survival probability, which decreases as age at diagnosis increases.

Calculate \(P(T_1>60 \mid T_2> 36, Z)\)

cond_prob2 <- pred_1
cond_prob2$Estimated.survival <- pred_1$Estimated.survival/pred_3$Estimated.survival

Plot \(P(T_1>60 \mid T_2> 36, Z)\)

cond_prob2 <- cond_prob2[order(cond_prob2$Z,cond_prob2$X),]
p <- plot_ly(x=cond_prob2$Z, y=cond_prob2$X, z=cond_prob2$Estimated.survival, type="scatter3d", mode="markers",color = cond_prob2$Estimated.survival)
p <- layout(p, scene = list(xaxis = list(title = "Age at diagnosis", range = c(0,60)), yaxis = list(title = "Risk score", range = c(6,12)), zaxis = list(title = "P(T1>5|T2>3)",range = c(0,1))))
p

As can be seen in this interactive figure, the covariate-adjusted conditional probability \(P(T_1>60 \mid T_2> 36, Z)\) ranges between 0.73 and 0.84, meaning that when the untreated eye survives for more than three years, the treated eye has a very high chance to survive more than five years. Interestingly, this conditional survival probability remains high even when the age at diagnosis increases, and is almost not affected by the age at diagnosis. However, this conditional probability does decrease as the risk score increases, yet this effect is less substantial than before (when compared to the case where \(T_2\leq 36\)).

Summary

In summary, when the untreated eye survives for more than three years, the probability that the treated eye will survive more than five years is very high and is almost not affected by the covariates. However, when the untreated eye fails during the first three years, the probability that the treated eye will survive more than five years is highest when the risk score is lowest and when age at diagnosis is lowest, and it decreases both as a function of age at diagnosis and risk score.